Diagnosis of weaknesses in modern error correction codes: a physics approach 
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One of the main obstacles to the wider use of the modem error-correction codes is that, due to the complex 
behavior of their decoding algorithms, no systematic method which would allow characterization of the Bit- 
Error-Rate (BER) is known. This is especially true at the weak noise where many systems operate and where 
coding performance is difficult to estimate because of the diminishingly small number of errors. We show how 
the instanton method of physics allows one to solve the problem of BER analysis in the weak noise range by 
recasting it as a computationally tractable minimization problem. 

PACS numbers: 89.70.+C, 02.50.-r 



Modern technologies, as well as many natural and sociolog- 
ical systems, rely heavily on a wide range of error-correction 
mechanisms to compensate for their inherent unreliability and 
to ensure faithful transmission, processing and storage of in- 
formation. There has been a great deal of research activity in 
coding theory in the last half a century that has culminated 
in the recent discovery of coding schemes that ap- 

proach a reliability limit set by classical information theory 
1 4]. The problem considered in this paper is of a special inter- 
est because of a unique feature of the modern coding schemes, 
which is referred to as an error floor Error floor is 

a phenomenon characterized by an abrupt degradation of the 
coding scheme performance, as measured by the BER, from 
the so-called water-fall regime of moderate Signal-to-Noise 
Ratio (SNR) to the absolutely different error-floor asymptotic 
achieved at high SNR. To estimate the error-floor asymptotic 
in the modern high-quality systems is a notoriously difficult 
task. Typical required BER values are 10^'^ for an optical 
communication system, 10^'^ for hard drive systems in per- 
sonal computers and as small as 10^^'' for storage systems 
used in banks and financial institutions. However, direct nu- 
merical methods, e.g. Monte Carlo, cannot be used to deter- 
mine BER below 10^^. 

To address this challenge we suggest a physics-inspired ap- 
proach that ultimately solves the problem of the error-floor 
analysis. The method is coined the "instanton" method, af- 
ter a theoretical particle in quantum physics that lasts for only 
an instant, occupying a localized portion of space-time 0]. 
Statistical physics uses the word instanton to describe a mi- 
croscopic configuration which, in spite of its rare occurrence, 
contributes most to the macroscopic behavior of the system 
1 8]. Our instanton is the most probable configuration of the 
noise to cause a decoding error 

We consider a model of a general communication system 
with error correction j4|]. Data originating from an informa- 
tion source are parsed into fixed length words. Each word 
is encoded into a longer codeword and transmitted through a 
noisy channel (e.g., radio or optical link, magnetic or optical 
data storage system, etc.). The decoder tries to reconstruct the 



original codeword using the knowledge of the noise statistics 
and the structure of the code. Error resilience is achieved at 
the expense of introduced redundancy, and information theory 
gives conditions for the existence of finite redundancy error 
correction codes. However it does not give a method for real- 
izing decoders of low complexity. In general there is no better 
way to reconstruct the codeword that was most likely transmit- 
ted than to compare the likelihoods of all possible codewords. 
However, this Maximal Likelihood (ML) algorithm becomes 
intractable already for codewords that are tens of bits long. 

A novel exciting era has started in coding theory with the 
discovery of Low-Density Parity-Check (LDPC) LL.3..9, .l(3l 
and turbo 01 codes. These codes are special, not only be- 
cause they can approach very close to the virtually error-free 
transmission limit, but mainly because a computationally effi- 
cient, so-called iterative, decoding scheme is readily available. 
When operating at moderate noise values these approximate 
decoding algorithms show an unprecedented ability to correct 
errors, a rem^lcable feature that has attracted a lot of theoreti- 
cal attention |5,6, 11, 12, 13, 14, 15]. (Notice also an alterna- 
tive statistical physics inspired approach 1 16] that offered an 
important insight into th e extraordinary performance of the it- 
erative decoding |17, 18, 19].) It is believed that the error floor 
is a fundamental consequence of iterative decoding, and that 
the approximate algorithms mentioned above are incapable of 
matching the performance of ML decoding beyond the error- 
floor threshold. The importance of error-floor analysis was 
recognized in the early stages of the turbo codes revolution 
f20], and it soon became apparent that LDPC codes are also 
not immune from the error-floor deficiency [^,21^^]. The 
main approaches to the error-floor analysis problem proposed 
to date include: (i) a heuristic approach of the importance 
sampling type |6], utilizing theoretical considerations devel- 
oped for a typical randomly constructed LDPC code perform- 
ing over the very special binary-erasure channel i23ll . and (ii) 
deriving lower bounds for BER |24]. 

Our approach to the error-floor analysis is different: we 
suggest an efficient numerical scheme, which is ab-initio by 
construction, i.e. the scheme requires no additional assump- 
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tions (e.g. no sampling). The numerical scheme is also accu- 
rate at producing configurations whose validity, as of actual 
optimal noise configurations, can be verified theoretically. Fi- 
nally, the instanton scheme is also generic, in that there are no 
restrictions related to the channel or decoding. 

Error-correction scheme. A message word consisting of 
K bits is encoded in an A^-bit long codeword, N > K. In the 
case of binary, linear coding, a convenient representation of 
the code is given hy M >N — K constraints, often called par- 
ity checks or simply checks. Formally, a ~ (ai , . . . , On) with 
a,- = ±1, is one of the 2^ codewords if and only if IliGa^/ = 1 
for all checks a = 1 , . . . ,M, where / G a if the bit / contributes 
the check a. The relation between bits and checks (we use 
/ G a and a 3 i interchangeably) is often described in terms of 
the M xN parity-check matrix H consisting of ones and zeros: 
Hai = 1 if / G a and Hai — otherwise. A bipartite graph rep- 
resentation of H, with bits marked as circles checks marked 
as squares and edges corresponding to respective nonzero el- 
ements of H, is usually called Tanner graph of the code. For 
an LDPC code H is sparse, i.e. most of the entries are ze- 
ros. Transmitted through a noisy channel, a codeword gets 
corrupted due to the channel noise, so that the channel out- 
put (receiver) is jc 7^ a. Even though an information about the 
original codeword is lost at the receiver, one still possesses the 
full probabilistic information about the channel, i.e. the condi- 
tional probability, P{x\&), for a codeword & to be a preimage 
for the output word x, is known. In the case of independent 
noise samples the full conditional probability can be decom- 
posed into the product, P{x\a') = Hi Pi^il'^'i)- A. convenient 
characteristic of the channel output at a bit is the so-called log- 
likelihood, hi = log[p(x,| + l)/p{xi\ — l)]/2,s'^, measured in 
the units of the SNR squared, s^. (In the physics formulation 
O [13 El d A is called the magnetic field.) The decoding 
goal is to infer the original message from the received output 
jt. ML decoding (which generally requires an exponentially 
large number, 2^, of steps) corresponds to finding the most 
probable transmitted codeword given x. Belief Propagation 
(BP) decoding 1.1. .3. 9.. 19.] constitutes a fast (linear in K,N) 
yet generally approximate alternative to ML. As shown in 1 1 ] 
the set of equations describing BP becomes exactly equivalent 
to the so-called symbol Maximum-A-Posteriori (MAP) de- 
coding in the loop-free approximation (a similar construction 
in physics is known as the Bethe-tree approximation lEsll l. 
while in the low-noise limit, s ^ °°, ML and MAP become in- 
distinguishable and the BP algorithm reduces to the min-sum 
algorithm: 

where the message field r||„^ is defined on the edge that con- 
nects bit / and check a at the n-th step of the iterative pro- 
cedure and ri|^^ = 0. The result of decoding is determined by 

magnetizations, m^"\ defined by the right-hand-side of Eq. 
with the restriction P 7^ Ct dropped. The BER at a given bit / 



becomes 

Bi = J dxQ{-mi{x})P{x\l), (2) 

where 9(z) = 1 if z > and 9(z) =0 otherwise; O = 1 is 
assumed for the input (since in a symmetric channel the BER 
is invariant with respect to the choice of the input codeword). 

When the BER is small, the integral over output configura- 
tions X in Eq. (|2j is approximated by, B,- ^ P{xinst\i), where 
JCinst is the special instanton configuration of the output min- 
imizing P{x\l) under the error-surface condition, mi{x} — 0. 
For the common model of the white symmetric Gaussian 
channel, p{x\<j) = exp(— .s^(x — a)^/2)/-\/27t/s-^, finding the 
instanton, (pij^j, = 1 — JCinst = l{u)u, turns into minimizing the 
length l{u) with respect to the unit vector in the noise space 
«, where l{u) measures the distance from the zero-noise point 
to the point on the error surface corresponding to u. 

Finding the instanton numerically. In our numerical 
scheme, the value of the length / (m) for any given unit vector u 
was found by the bisection method. The minimum of / («) was 
found by a downhill simplex method also called "amoeba" 
I26I1 . with accurately tailored (for better convergence) anneal- 
ing. The numerical instanton method was first successfully 
verified in |27] against analytical loop-free results. 

Our demonstrative example is the (155,64,20) LDPC code 
described in lE^ . (The parity check matrix of the code is 
shown in Fig. SI of Appendix A.) The code includes 155 bits 
and 93 checks. Each bit is connected to three checks while 
any check is connected to five bits. The minimal Hamming 
distance of the code is /^^ — 20, i.e. at 5 ^ 1, and if the de- 
coding is ML, BER becomes ~ exp(— 20 • (See Fig. S2 
of Appendix A for Monte Carlo evaluation of BER vs SNR 
for the code.) We aim to find and describe the instanton(s) 
that determines BER in the error-floor regime (for min-sum 
decoding): ^ exp(— /^f ■ 5^/2) with l^f < /^^ = 20. Our nu- 
merical, and subsequent theoretical, analyses suggest that the 
instantons, as well as l^f, do depend on the number of itera- 
tions. We do not detail this rich dependence here, focusing 
primarily on the already nontrivial case of four iterations. 

The instanton with the minimal length of = 46^/210 ~ 
10.076 is shown in the upper part of Fig. 1 A, see also Fig. S3 
of Appendix A. Everywhere away from the 12-bit pattern 
the noise is numerical zero. The resulting nonzero noise 
values are proportional to integers (within numerical preci- 
sion). If decoding starts from the instanton configuration of 
the noise, magnetization is exactly zero at the bit number 
"77". This minimal length instanton controls BER at s 
however, for any large but finite s one should also account 
for many other "close" instantons with l{u) « la, thus ap- 
proximating Bi ~ Linst^(^inst| !)■ Two instanton configura- 
tions shown in Fig. IB and Fig. IC represent two local min- 
ima ll = 806/79 « 10.203 and = 44Vl88 « 10.298 re- 
spectively, that are the closest to the minimal one. (See also 
Appendix A Figs. S4-S5.) These instantons were found as a 
result of multiple attempts at "amoeba" minimization. 

Interpretation of the instantons found. The remarkable 
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FIG. 1: Parts of the full Tanner graph with nonzero noise for the instantons, corresponding to (a) simple, (b) degenerate and (c) sign- 
alternating pseudo-codewords, are shown in three panels each consisting of three diagrams. Bits are numbered according to the (155,64,20)- 
code definition (top left and bottom) and the noise level (top right, where the area of a bit/circle is proportional to the corresponding number). 
For the computational tree (bottom panel) the bits drawn in color participate in the pseudo-codeword and the shaded bit marks the error 
position. The marked checks/squares correspond to the points of (b) degeneracy and of (c) sign-alternation. 




FIG. 2: Interpretation of the instanton as a median within a set of pseudo-codewords. Three panels show the set of pseudo-codewords for 
the three instantons described in Fig. 1. Bits on the computational tree painted in white/black correspond to +1/ — 1. Other notations/marks 
are in accordance with the captions of Fig. 1. 



integer/rational structure of the instantons found numerically 
by "amoeba" admits a theoretical explanation. Our algebraic 
construction generalizes the computational tree approach of 
Wiberg 1 121 . The computational tree is built by unwrapping 
the Tanner graph of a given code into a tree from a bit for 
which we would like to determine the probability of error 
(The erroneous bit is shaded in Fig. 1.) The number of gener- 
ations in the tree is equal to the number of BP iterations (for 
more details see 1 11]). As observed in 1 12], the result of de- 
coding at the shaded bit of the original code is exactly equal to 
the decoding result in the tree center. It should be noted that 
once magnetic fields representing an instanton are distributed 
on the tree, one can verify directly (by propagating messages 
from the leaves to the tree center) that the algorithm produces 
zero magnetization at the tree center. Any check node pro- 
cesses messages coming from the tree periphery in the fol- 
lowing way: (i) the message with the smallest absolute value 
(we assume no degeneracy in the beginning) is passed, (ii) the 
source bit of the smallest message is colored, and (iii) the sign 
of the product of inputs is assigned to the outcome. At any 
bit that lies on the colored leaves-to-center path the incoming 
messages are summed up. The initial messages at any bit of 



the tree are magnetic fields and, therefore, the result obtained 
in the tree center is a linear combination of the magnetic fields 
with integer coefficients. The integer n, corresponding to bit 
; of the original graph is the sum of the signatures over all 
colored replicas of ; on the computational tree. Therefore, the 
condition at the tree center becomes = 0. Returning 

to the original graph and maximizing the integrand of Eq. (|2ji 
with the condition enforced we arrive at the following expres- 
sions for the instanton configuration and the effective weight, 
respectively: 

9.=«,(E«,)/(E«?), ^'=(E«.)7(E«0' 

( i i i 

where the equation applies to the Gaussian channel, how- 
ever its generalizations to any other channel is straightfor- 
ward. One can check directly (e.g. looking at Fig. lA) that 
Eqs. Q are satisfied for the minimum weight instanton. In 
this case we find that the signature of any colored message 
before and after processing through a check remains intact, 
and thus the resulting n, for any colored bit is just a total 
number of the bit's replicas. The structure of this instanton 
is exactly equivalent to one of the codewords on the computa- 
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tional tree, called a pseudo-codeword as generically it does not 
correspond to a codeword on the original graph LlZil . How- 
ever, Eq. Q also suggests another possibility th at g oes be- 
yond the standard pseudo-codeword construction 11211. In the 
case shown in Fig. IC the colored part of the tree does corre- 
spond to a pseudo-codeword (by structure), however the pale 
part of the computational tree cannot be neglected as the noise 
values at these nodes are nonzero. This peculiarity is due to 
the fact that some of the checks shown in the upper part of 
Fig. IC are connected to more than two colored bits. One 
finds that the signature of the message propagating from bit 
"75" to bit "127" alternates because of the pale "0" lying on 
a leaf, 8/47 = |/io| > Ihsl = 3/47, ho = -8/47 < 0. This 
modifies nvs making it equal to 4, as one of the 6 replicas of 
the bit "75" contributes to the total count —1 instead of +1. 
Moreover, looking at Fig. IB one finds that the instanton can 
be even more elaborate as the number of replicas for some 
bits becomes fractional. ("H-"-sign on Fig. IB corresponds to 
+7/18.) This is actually the degenerate case with a colored 
structure bifurcating at a check (connected to the bits "0","77" 
and "36") so that the messages entering the check from two 
distinct periphery have different signatures but are exactly 
equal to each other by the absolute value, ho = —h-n = 18/79. 
Eq. Q does not work for this case, but the following gen- 
eralization corrects the problem: one needs to introduce an 
additional condition accounting for the degeneracy. In our ex- 
ample this extra condition can be simply stated as ho = —h-n. 
(See Appendix A Fig. S6.) 

Instantons also allow for a complementary interpretation. 
A decoding error occurs when the magnetization in the com- 
putational tree center, which can be considered as a sum over 
all pseudo-code words weighted by, exp {s^ hipi) , turns to 
zero (with pi being the number of bit / replicas with — 1 sign in 
the pseudo-codeword). In the case of high SNR (large mag- 
netic fields) the sum is dominated by the pseudo-codewords 
of maximal weight. Therefore, any instanton, as a configura- 
tion of magnetic fields, should be equidistant from some set of 
k>2 pseudo-codewords: Y.ihip\^^ = ■ ■ ■ = Y^ihiP^P , where at 
least one of them has + 1 value and at least one has — 1 value in 
the tree center to achieve zero magnetization. And indeed the 
set of relevant pseudo-codewords for the (155,64,20) code 
example, shown in Fig. 2, is a pair in the cases (a), (c) and a 
triple in the case (b). (See Appendixes.) 

To conclude, in this Letter we demonstrated that the instan- 
ton approach is a very powerful, practical and generic instru- 
ment for quantitative analysis of the error floor The success 
makes us confident that this novel method will be indispens- 
able for future design of good and practical error-correcting 
schemes. 
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APPENDIX A: SUPPLEMENTARY FIGURES 




Figure SI. Parity check matrix H for the (155,64,20) LDPC 
code. The matrix consists of 3 x 5 blocks. Each block is a 
square 31x31 matrix. Empty /filled elements of the matrix 
stand for 0/1. Bits are numbered from "0" to "154". The 
girth of this code is eight. 




SNR^ 



Figure S2. Frame-Error-Rate (FER) vs SNR^ for the 
(155,64,20) code and Belief Propagation decoding. The 
filled/empty circle-marks correspond to result of Monte 
Carlo evaluation of FER for 4/1024 iterations of BR The 
straight/dashed line corresponds to the (a)-instanton asymp- 
totic, exp[- (46^/210) -5^/2] and the ML asymptotic, 
exp[-20-iV2]. 
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Figure S3. Min-sum decoding for the (a) instanton on the 
computational tree. All numbers are in units of 1/105. The 
numbers shown on the bits/circles are the values of the mag- 
netic fields, h. The number shown next to each check/square 
is the message, r|, arriving at the check on the q-l\\ step of the 
iteration procedure, where A — q corresponds to the number 
of bits separating this check from the tree center Other nota- 
tions/marks are in accordance with the caption of Fig. 1 of the 
main text. 




Figure S4. Min-sum decoding for the (b) instanton on the 
computational tree. All numbers are in units of 1/79. Other 
notations/marks and explanations are in accordance with cap- 
tions of Fig. 1 of the main text and Fig. S3. 



Figure S5. Min-sum decoding for the (c) instanton on the 
computational tree. All numbers are in the units of 1/47. 
Other notations/marks and explanations are in accordance 
with captions of Fig. 1 of the main text and Fig. S3. 




Figure S6. Geometrical interpretation of the degeneracy ob- 
served in case (b). The plot shows distance for the noise 
configuration in case (b), minimized with respect to all the 
magnetic fields except ho and /177, vs the two remaining fields. 
The relevant part of the {ho^h-n) plane is the quadrant hd < 0, 
/177 > 0. Looking for the instanton in the |/!o| < \h^^\ domain 
and thus assuming that the "0" bit connected to the red check 
in Fig. IB of the main text is colored while bit "77" is pale, one 
gets the paraboloid, shown in pink in the Figure, that achieves 
its minimum outside of the domain, i.e. in the |/!o| > I/177I 
semi-quadrant. Similarly, if one looks for the instanton in the 
|/io| > \hn\ domain one again finds an inconsistency as the 
respective paraboloid, shown in yellow in the Figure achieves 
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its minimum in the |/!o| < \h-i-i \ semi-quadrant. Therefore, the 
point of actual minimum, shown by the green dot on the Fig- 
ure, lies exactly at the minimum of the angled join of the two 
domains, |/jo| = \h-]-]\. 

APPENDIX B: INSTANTONS FOR THE MIN-SUM 
DECODING 

These notes consist of three parts. The first part is devoted 
to explaining how the entire instanton family for an arbitrary 
LDPC code decoded by the min-sum algorithm can be fully 
characterized using the computational tree approach. The sec- 
ond part describes an alternative exposition which allows one 
to represent an instanton as a configuration of magnetic fields 
equidistant from some set of codewords on the computational 
tree. The third part formulates a relation between the theo- 
retical and numerical approaches and suggests challenges and 
questions that need to be addressed in the future. 

Colored/signature structure and constraint minimization 

The basic object for our construction is the computational 

tree and its colored/pale/uncolored parts (as briefly introduced 
in the main text). The computational tree is a tree constructed 
by a simple unwrapping of the Tanner graph of the code into 
a tree starting from the bit where the BER is calculated. The 
number of generations on the tree is equal to the number of 
iterations of the decoder, n-n. If rin is larger than, or equal to, a 
quarter of the code girth (defined as the length of the shortest 
loop of the original Tanner graph, measured by the number of 
edges within the loop), then the computational tree contains 
more than one replica of some of the bits of the original code. 

Consider an arbitrary configuration of magnetic field, h (or 
noise field, <p) on the Tanner graph, and on the computational 
tree respectively. Calculating the magnetization (or switch- 
ing from physics jargon to communication theory jargon, a- 
posteriori log-Ukelihood) at the nit-th iteration in the center 
of the tree one derives, ^center = h n. rii, defined on a bit of 
the original Tanner graph is an integer. It is a sum of contri- 
butions, each originating from the respective bit/replica on the 
computational tree. Colored are bits on the tree that contribute 
the integer + 1 or — I . (In the Figures of the main text and Sup- 
porting Figures we use different colors to identify bits in the 
computational tree originating from different bits of the origi- 
nal Tanner graph.) A colored bit on the computational tree has 
the signature -1-1/ — 1 if it contributes -1-1/ — 1 to the integer 
associated with the respective bit on the Tanner graph. Uncol- 
ored bits (i.e. bits not shown in the Figures) or pale bits (i.e. 
bits shown pale in the Figures) on the tree do not contribute 
to the respective integer (one may also say that the respective 
contribution is just zero) according to the min-part of the min- 
sum rules described in Eq. (1) of the main text. We draw a bit 
on the computational tree in pale if it does not contribute to 
respective integer, however at least one of its siblings, i.e. bits 



on the tree originating from the same bit on the computational 
tree, does contribute to the integer) 

Let us now describe how an individual contribution of a 
colored bit on the tree to the respective integer (that is -|-1 
or —1) is calculated. We aim to calculate the contribution to 
magnetization at the tree center counting integers according 
to the min-sum rule. We assign signatures to the colored 
leaves of the tree and start an iterative procedure which as- 
signs signatures moving from the tree leaves towards the tree 
center. Consider the case when at a certain step of the itera- 
tion procedure a check receives messages from some number 
of bits among which only one is colored. Then one calculates 
the product of the signatures associated with the messages this 
check receives from the remaining bits. If the resulting prod- 
uct is + 1 / — 1 the signature of the colored bit is + 1 / — 1 and 
the signatures of the colored bits, laying on the tree branch 
grown from the given colored bit, do not/do change. The other 
possibility (that will be called degenerate) is that a check re- 
ceives two (or more) messages that all have the same absolute 
value, which is also the minimal of all the messages received. 
Then, one has the freedom to color only one of the degener- 
ate bits with a colored branch grown from it with the signa- 
tures assigned as described above. The iterative procedure of 
the signature assignment is terminated once the three center is 
reached. One calls a check marked if it lies in between two 
colored bits of different signatures. 

in- ) 

The union of all colored bits, i.e. bits contributing to ni^^^i^^, 
is called the colored structure. Any check connected to the 
colored structure is actually connected to two bits of the struc- 
ture. Another important characteristic complementing the no- 
tion of the colored structure on the computational tree is the 
set of aforementioned signatures ±1 associated with any bit of 
the colored structure. In a degenerate case, one finds multiple 
colored/signature structures associated with a given configu- 
ration of magnetic fields. Each degenerate structure will ac- 
tually correspond to a distinct linear combination of magnetic 

in- ) 

fields equal to ^center- Therefore, of the whole variety of pos- 
sible degenerate colored/signature structures corresponding to 
the same magnetic field one can always select a set of linearly 
independent ones. 

So far, this description was generic, i.e. not restricted to a 
specific configuration of the magnetic field. Let us now fix 
the family of linear independent colored/signature structures 
just explained and allow variation in the value of magnetic 
fields. Our goal here is to find an instanton conditioned to 
the specific form of the family of the linear independent col- 
ored/signature structures. Finding the instanton means mini- 
mizing f- = {\ — with respect to h under the additional set 
of linearly independent conditions, m^"") = h ■ n^^ = 0, where 
/V is an index enumerating these conditions each correspond- 
ing to a certain colored structure. (The expression presented 
above for the length / applies to the white Gaussian channel, 
however generalization for any other channel is straightfor- 
ward.) The resulting expression for the optimal configuration 
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of the noise, q) = 1 — /i, is 

<p = i:n*'''(G-'WE«!'^ G.v^n^^'-n^^). (SI) 

This expression (also generahzing Eq. (3) of the main text for 
an arbitrary type of instanton degeneracy) should be checked 
for consistency with the family of the colored/signature struc- 
tures assumed for the instanton. If the consistency check is 
met, the instanton construction is completed. 

Let us now demonstrate how this formal description works 
for the three instanton examples (a), (b) and (c) described in 
Figs. 1, 2 of the main text and also illustrated in Figs. S3-S6. 
Instantons (a) and (c) are both explained by a single colored 
structure. For the (a) instanton each bit contributes +1 to the 
respective component of n. For the (c) example all contri- 
butions are +1 except of the one coming from the "75" bit 
connected to the marked check. Since ho < 0, the message 
originating from this bit contributes with the opposite sign 
to the magnetization. There exist 6 replicas of the "75" bit 
on the computational tree, however taking into account that 
one replica of the bit contributes — 1, one finds that the actual 
value of the noise is ^ 4 rather than ~ 6. Since \ho\ > \hj5\, 
the "0" bit is pale thus it does not contribute to the magneti- 
zation. Considering the (b) instanton one finds that this is a 
degenerate example, with |/io| = \hn\- There are two linearly 
independent colored/signature structures describing the (b) in- 
stanton. The two structures are different only at the two bits 
on the tree leaves shown in Fig. IB of the main text adjusted to 
the marked check. The first colored structure does not contain 
bit "77" (zero contribution to the magnetization) while bit "0" 
contributes +1 to the respective component of n. The second 
structure does not contain bit "0" while bit "77" contributes 
— 1 to the respective component of n (simply because ho < 0, 
thus forcing the respective message to contribute to the mag- 
netization with the opposite sign). 

One natural question to ask about the degenerate case (b) is 
the following: can one of the two colored structures describ- 
ing the instanton be forming its own non-degenerate instan- 
ton? The answer is negative. Indeed, the colored/signature 
structure generates (through the minimization procedure de- 
scribed above) such a configuration of magnetic fields that 
will not be consistent with the colored/signature structure one 
started from. Considering the structure with the "0" bit con- 
nected to the marked check being pale (and thus the signature 
field associated with the colored bit "77" connected to the red 
check being —1) one finds that the magnetic field minimiz- 
ing I will actually be inconsistent with the colored/signature 
structure. Considering the other configuration (the "0" bit is 
colored with +1 signature while the "77" bit is pale) one finds 
again that the resulting magnetic field is inconsistent with the 
colored structure. To resolve this inconsistency one needs to 
account for the two configurations simultaneously, thus intro- 
ducing two constraints, not one. The degeneracy of the (b) 
instanton is illustrated in Fig. S6. 

Let us also notice that the discrete nature of consistency 
check (yes/no answer as a result) puts degenerate configura- 



tions on equal footing (in the sense of counting all possibili- 
ties) with the non-degenerate configurations: considering the 
family of all possible instantons for the given computational 
tree one finds that the number of degenerate instantons is com- 
parable with the number of nondegenerate instantons. 

Instantons as medians between pseudo-codewords 

We consider an instanton with the set of linearly indepen- 
dent structures already established. Each colored structure, 
indexed by /j, corresponds to the constraint, h ■ n'^' — 0, im- 
posed on the magnetic field, h. Each of the constraints can ac- 
tually be reformulated in terms of a pair of pseudo-codewords 
on the computational tree: 

^ /,,a(^^+' = £ h,a'r^-\ (S2) 

i'Gti'ee /Gtree 

where / stands for index assigned to a bit on the computa- 
tional tree; the magnetic field on the computational tree bit is 
equal to the magnetic field defined on the respective bit of the 
original graph; and the pseudo-codewords, o'^'^', are the two 
distinct configurations of the binary field, o,- = ± 1 , defined on 
each bit / of the computational tree that satisfy all the checks 
on the computational tree. 

Let us now discuss how the pseudo-codewords can be con- 
structed if the respective constraint /j described by Eq. ( IS2t . 
is already established. If the signature field, described in the 
previous Section of the Notes, corresponding to the structure 
/J, does not contain a single — 1 element, then a''"'+' is the all 
unity codeword (+1 on all bits of the computational tree) and 
(ji^;-) is the pseudo-codeword containing ~1 on all the col- 
ored bits of the structure and + 1 on all other bits. If, however, 
the colored structure does contain some — 1 signatures the sit- 
uation is more elaborate as both pseudo-codewords are non- 
trivial. The algorithm that allows restoration of the pseudo- 
code words starts by determining the values of the colored 
bits for 0'^'+' that are set equal to the values of the signatures. 
The uncolored bits are assigned values + 1 . Although it is pos- 
sible to determine the bit values of the pale substructures the 
procedure is elaborate and we are not presenting it here. Fi- 
nally, the pseudo-codeword a''''^' is obtained from by 
changing the signs of colored bits with the uncolored and pale 
bits remaining the same. 

Fig. 2 of the main text shows three examples of the pseudo- 
codeword construction for the three instantons discussed in 
the manuscript. Examples (a), (c) contain one pair of compet- 
ing pseudo-codewords. However, the two cases are different. 
In the case (a) the colored/signature structure does not contain 
— 1 bits, thus one pseudo-codeword is just the all unity code- 
word and another pseudo-codeword contains — 1 at all the bits 
of the colored structure and +1 at all other bits. In the (c) 
case the colored/signature structure does contain — 1 bit thus 
resulting in two distinct pseudo-codewords shown in Fig. 2C 
of the main text. Example (b) corresponds to the degenerate 
case with the two pairs of pseudo-codewords involved in the 
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conditions iS2\ . However, one pseudo-codeword enters both 
conditions (that is the one shown on the top diagram of Fig. 2B 
of the main text) therefore the total count for the case (b) gives 
three pseudo-codewords being equidistant from the instanton 
configuration of the magnetic fields. 

General Remarks 

Let us note that the analysis presented above, in addition 
to its theoretical significance, may be helpful for accelerating 
the instanton-amoeba numerical procedure, e.g. through guid- 
ing selection on the final stage of the minimization. We also 
expect that this theoretical analysis will be instrumental for 
formulating the right questions to address by the instanton- 
amoeba method, or by other minimization methods aiming at 
finding the instanton numerically. In what follows, we con- 
clude by posing some questions that we did not yet study but 
plan to address in the future. 

• More detailed exploration of the phase space, especially 
in the context of describing not only the minimal dis- 
tance Zjnin contribution but also the family of other "low 
laying" instantons. The particular question of interest 
here is to estimate the "density of states/instantons", 
that is to answer the question: how many instantons are 
found within the 5/ vicinity of the one correspondent to 

*min ■ 

• Dependence of BER on the number of iterations. As 
we already indicated in the main text our preliminary 
tests show that instantons and thus asymptotic estimates 
for BER do change with the number of iterations. We 
will be interested to explain this dependence. We will 
also be testing with our instanton-amoeba approach, the 
validity of the graph covers method suggested recently 

• Dependence on the code length. It is important to an- 
alyze the family of LDPC codes with varying code 
length, A^. Of a special interest are the regular LDPC 
codes where the Hamming distance grows with A^, e.g. 
Margulis codes isflll . Then, the relevant question is: 
how does Imin (and other characteristics of the error 
floor) change with for a given family of codes? This 
study will essentially lead to analysis of the finite-size 
effects, already discussed in the water-fall domain fsil, 
but not yet explored in the asymptotic regime of the 
error-floor. 

• Does BP/min-sum decoding perform better than other 
suboptimal algorithms (that can possibly exist) of the 
same complexity, e.g. linear in A^? Even if the answer 
is yes (that is by no means guaranteed), what would be 
the best decoding for a higher level of complexity, e.g. 
A^", where a > 1? Once an idea of better decoding is 
formulated, our instanton-amoeba toolbox will be in- 
dispensable in answering the aforementioned questions 
and also testing in depth the performance of the new 
decoding. 



• Other types of codes, e.g. turbo codes. Turbo codes 
show remarkable performance at moderate SNR but 
they are also infamous for demonstrating much higher 
(than comparable in size LDPC codes) error floors. 
Even though some important similarities between the 
LDPC codes and turbo codes are established Is^l . the 
decoders of these two types of codes are different and 
it becomes important to analyze the performance of 
the turbo scheme, especially in light of the turbo-codes 
popularity. 

• Other, application specific, channels. The instanton- 
amoeba approach is not limited to the white Gaussian 
channel, which we choose primarily for the purpose of 
demonstration, but can be applied straightforwardly to 
other types of channels, e.g. with correlations among 
received samples. Of special interest will be to analyze 
the performance of fiber-optic communication channels 
where the effects of fiber dispersion |33], birefringence 
and amplifier noise |34] will be accounted for. Another 
two interesting channel types are magnetic and optical 
recording channels exhibiting high level of nonlinearity 
and correlations among received samples js^l- 

• There are many problems in the information and com- 
puter sciences that are different from standard coding 
problem but are also dependent or sensitive to rare er- 
rors. Therefore, estimating performance/BER in these 
problems is a major step required for their comprehen- 
sive analysis. Two interesting examples here are (i) 
inter-symbol interference, that is especially challeng- 
ing in the context of two-dimensional and three- 
dimensional information storage, and (ii) estimating al- 
gorithmic errors in the domain of typically good per- 
formance within a combinatorial optimization K-SAT 
setting IstIi . 
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